Long-Term Visitation Value for Deep Exploration in Sparse-Reward Reinforcement Learning

نویسندگان

چکیده

Reinforcement learning with sparse rewards is still an open challenge. Classic methods rely on getting feedback via extrinsic to train the agent, and in situations where this occurs very rarely agent learns slowly or cannot learn at all. Similarly, if receives also that create suboptimal modes of objective function, it will likely prematurely stop exploring. More recent add auxiliary intrinsic encourage exploration. However, lead a non-stationary target for Q-function. In paper, we present novel approach (1) plans exploration actions far into future by using long-term visitation count, (2) decouples exploitation separate function assessing value actions. Contrary existing use models reward dynamics, our off-policy model-free. We further propose new tabular environments benchmarking reinforcement learning. Empirical results classic benchmarks show proposed outperforms rewards, especially presence function. Results suggest scales gracefully size environment.

برای دانلود باید عضویت طلایی داشته باشید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Study of Count-Based Exploration for Deep Reinforcement Learning

Count-based exploration algorithms are known to perform near-optimally when used in conjunction with tabular reinforcement learning (RL) methods for solving small discrete Markov decision processes (MDPs). It is generally thought that count-based methods cannot be applied in high-dimensional state spaces, since most states will only occur once. Recent deep RL exploration strategies are able to ...

متن کامل

EX2: Exploration with Exemplar Models for Deep Reinforcement Learning

Deep reinforcement learning algorithms have been shown to learn complex tasks using highly general policy classes. However, sparse reward problems remain a significant challenge. Exploration methods based on novelty detection have been particularly successful in such settings but typically require generative or predictive models of the observations, which can be difficult to train when the obse...

متن کامل

Curiosity-driven Exploration for Mapless Navigation with Deep Reinforcement Learning

This paper investigates exploration strategies of Deep Reinforcement Learning (DRL) methods to learn navigation policies for mobile robots. In particular, we augment the normal external reward for training DRL algorithms with intrinsic reward signals measured by curiosity. We test our approach in a mapless navigation setting, where the autonomous agent is required to navigate without the occupa...

متن کامل

Towards Cognitive Exploration through Deep Reinforcement Learning for Mobile Robots

Exploration in an unknown environment is the core functionality for mobile robots. Learning-based exploration methods, including convolutional neural networks, provide excellent strategies without human-designed logic for the feature extraction [1]. But the conventional supervised learning algorithms cost lots of efforts on the labeling work of datasets inevitably. Scenes not included in the tr...

متن کامل

Diversity-Driven Exploration Strategy for Deep Reinforcement Learning

Efficient exploration remains a challenging research problem in reinforcement learning, especially when an environment contains large state spaces, deceptive local optima, or sparse rewards. To tackle this problem, we present a diversitydriven approach for exploration, which can be easily combined with both offand on-policy reinforcement learning algorithms. We show that by simply adding a dist...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Algorithms

سال: 2022

ISSN: ['1999-4893']

DOI: https://doi.org/10.3390/a15030081